Tight Bounds on Profile Redundancy and Distinguishability
نویسندگان
چکیده
The minimax KL-divergence of any distribution from all distributions in a collection P has several practical implications. In compression, it is called redundancy and represents the least additional number of bits over the entropy needed to encode the output of any distribution in P . In online estimation and learning, it is the lowest expected log-loss regret when guessing a sequence of random values generated by a distribution in P . In hypothesis testing, it upper bounds the largest number of distinguishable distributions inP . Motivated by problems ranging from population estimation to text classification and speech recognition, several machine-learning and information-theory researchers have recently considered label-invariant observations and properties induced by i.i.d. distributions. A sufficient statistic for all these properties is the data’s profile, the multiset of the number of times each data element appears. Improving on a sequence of previous works, we show that the redundancy of the collection of distributions induced over profiles by length-n i.i.d. sequences is between 0.3 · n and n log n, in particular, establishing its exact growth power.
منابع مشابه
On Lower Bounds for the Redundancy of Optimal Codes
The problem of providing bounds on the redundancy of an optimal code for a discrete memoryless source in terms of the probability distribution of the source, has been extensively studied in the literature. The attention has mainly focused on binary codes for the case when the most or the least likely source letter probabilities are known. In this paper we analyze the relationships among tight l...
متن کاملRedundancy-Related Bounds on Generalized Huffman Codes
This paper presents new lower and upper bounds for the compression rate of optimal binary prefix codes on memoryless sources according to various nonlinear codeword length objectives. Like the most well-known redundancy bounds for minimum (arithmetic) average redundancy coding — Huffman coding — these are in terms of a form of entropy and/or the probability of the most probable input symbol. Th...
متن کاملImproved Redundancy Bounds for Exponential Objectives
We present new lower and upper bounds for the compression rate of binary prefix codes optimized over memoryless sources according to two related exponential codeword length objectives. The objectives explored here are exponential-average length and exponential-average redundancy. The first of these relates to various problems involving queueing, uncertainty, and lossless communications, and it ...
متن کاملTight Bounds on the Average Length, Entropy, and Redundancy of Anti-Uniform Huffman Codes
In this paper we consider the class of anti-uniform Huffman codes and derive tight lower and upper bounds on the average length, entropy, and redundancy of such codes in terms of the alphabet size of the source. The Fibonacci distributions are introduced which play a fundamental role in AUH codes. It is shown that such distributions maximize the average length and the entropy of the code for a ...
متن کاملTight Bounds for Gomory-Hu-like Cut Counting
By a classical result of Gomory and Hu (1961), in every edgeweighted graph G = (V,E,w), the minimum st-cut values, when ranging over all s, t ∈ V , take at most |V |−1 distinct values. That is, these (|V | 2 ) instances exhibit redundancy factor Ω(|V |). They further showed how to construct from G a tree (V,E′, w′) that stores all minimum st-cut values. Motivated by this result, we obtain tight...
متن کامل